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Thermal folding Molecular Dynamics simulations of the domain C5 from Myosin Binding Protein 
C were performed using a native-centric model to study the role of three mutations related to 
Familial Hypertrophic Cardiomyopathy. Mutation of Asn755 causes the largest shift of the folding 
temperature, and the residue is located in the CFGA' /3-sheet featuring the highest ^-values. The 
mutation thus appears to reduce the thermodynamic stability in agreement with experimental data. 
The mutations on Arg654 and Arg668, conversely, cause a little change in the folding temperature 
and they reside in the low $-value BDE /3-sheet, so that their pathologic role cannot be related to 
impairment of the folding process but possibly to the binding with target molecules. As the typical 
signature of Domain C5 is the presence of a longer and destabilizing CD-loop with respect to the 
other Ig-like domains we completed the work with a bioinformatic analysis of this loop showing a 
high density of negative charge and low hydrophobicity. This indicates the CD-loop as a natively 
unfolded sequence with a likely coupling between folding and ligand binding. 
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I. INTRODUCTION 

Familial Hypertrophic Cardiomyopathy (FHC) is a ge- 
netic disease causing significant impairment of cardiac 
functionality and premature death in children and young 
adults [l| . A number of mutations in genes encoding car- 
diac sarcomeric proteins including the /3-myosin heavy 
chain, the cardiac troponin T, titin, and cardiac myosin 
binding protein C (MyBP-C) have been found to corre- 
late with such disease 0, i, ll i, i] . The FHC patients 
with MyBP-C mutations represent 20-45 % of the to- 
tal 0, in , so that mutations on this protein are the sec- 
ond most common cause of the disease. While nonsense 
mutations on MyBP-C gene determine a premature ter- 
mination of translation of the C-terminus and result in 
a mild phenotype 0, S Q , a number of missense muta- 
tions lead to a severe phenotype and the precise mech- 
anism through which they cause the disease is still un- 
known [13, El- MyBP-C is a linear sequence of 11 IgF 
like and fibronectin-like domains referred to as CO-CIO 
working as a potential regulator of cardiac contractil- 
ity [l|. According to Moolman-Smook model [H, [Tsj . 
three MyBP-C molecules form a ring around the thick 
filaments (Figure [T]). The collar is stabilized by specific 
interactions between domains C5-C8 of a molecule and 
domains C7-C10 of the neighboring one. The amino- 



terminal region between domains CO and C4 protrudes 
out of the thick filament and, upon phosphorylation, 
interacts with subfragment 2 of Myosin (S2) acting as 
a brake for muscular contraction [ij]. Mutations on 
MyBP-C might cither prevent the C0-C4 region from in- 
teracting with the S2 fragment (hyper-contractility) or 
force a carboxy-truncated mutant to permanently inter- 
act with S2 (hypo-contractihty) [Tsl. [isj. 

Our work will be concerned with the folding behav- 
ior of domain C5 whose structure was resolved through 
NMR by Idowu et al. ^6J. This domain belongs to the 
Igl set of the Immunoglobulin superfamily and it fea- 
tures a typical /3-sandwich structure with two twisted 
/3-sheets closely packed against each other. The first /3- 
sheet {(31) is formed by strands C,F,G and A', while the 
second /3-sheet comprises strands B,D and E (see Fig.[S]). 
A remarkable peculiarity of the C5 domain of the car- 
diac MyBP-C isoform is the presence of two long inser- 
tions not present in the fast and slow skeletal isoforms. 
The first insertion is 10-residue long and is located in 
the linker between the C4 and C5 domains; the second 
insertion is 28 residues in length and resides in the CD 
loop [3. 

A recent experimental work [Tb*! proved that the N- 
terminal region containing the first inscrtioir is not just a 
linker between C4 and C5, but it plays an important role 
in the thermodynamical stability of the domain. Con- 
versely, the long and highly mobile prolin-reach CD-loop 
destabilizes the protein lowering the folding tempera- 
ture as compared to other Ig-domainsfT^. This loop is 
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suspected to form an SH3 domain recognition sequence 
presumably binding to the CaM-II like Kinase that co- 
purifies with MyBP-C 

Three FHC causing mutations have been identified on 
C5 domain: Asn755Lys, Arg654His and Arg668His. The 
first one Ues on the FG-loop and leads to a significant 
destabilization of the protein yielding a severe pheno- 
type [12, 16, 19]. The Arg654His and Arg668His muta- 
tions, related to a much milder phenotype, are reported 
not to impair the thermodynamic stability of the pro- 
tein. Residue Arg654 is actually suspected to regulate 
the specificity of the binding of positively charged sub- 
strates, as it is located in the negatively charged CFGA' 
face that is a potential target for the binding with do- 
main C8 [13, III [H- A role in ligand binding is also 
postulated for Arg668 0. 

The purpose of the present work is to investigate 
through MD simulations the role of the above men- 
tioned three known FHC causing mutations. Clarke and 
coworkers showed that for the proteins belonging to the 
Ig-supcrfamily both transition and native states are sta- 
bilized by the same contacts dictated by protein topol- 
ogy [2l|. This implies that the Ig-superfamily mem- 
bers share similar folding pathways mainly determined 
by their common geometry. Therefore the native-centric 
approach seems to be the natural framework to inves- 
tigate the folding properties of Ig-like C5 domain. To 
incorporate topology and some specific chemical feature 
as well as the effect of the side chain packing we resorted 
to consider a heavy-map Go model where native contacts 
are identified on the basis of the steric hindrance of side- 
chains. Moreover we also introduced some amount of 
heterogeneity in the energetic couplings of the Go force 
field. This approach is particularly suitable for the C5 
domain because we need to address the problem of dis- 
criminating the effects of mutations such as Arg654His 
and Arg668His modeled through the removal of the same 
number of native contacts. 

The importance of introducing energetic heterogeneity 
in a Go- model for a reliable mutation analysis, is showed 
for example, by the work of Clementi et al. |22,] . where 
all the available experimental data on free energy differ- 
ences upon single mutations of S6 ribosomal protein and 
its circular permutants, were reproduced with correlation 
coefficients larger than 0.9. 



II. METHODS 

When the folding process is mainly driven by the topo- 
logical constraints of the native state, it is convenient to 
use simplified coarse-grained models describing the pro- 
tein molecule as a chain of beads centered on the Cq, 
carbon positions [H H HI, IH, HI] . The Go force 
field introduces a bias towards native structure rewarding 
native-like interactions, through Lennard- Jones attrac- 
tive forces and appropriate angular potentials embodying 
secondary structure motifs. The approach assigns to the 
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FIG. 1; Moolman-Smook model of MyBP-C arrangement in 
the sarcomere; the trimeric collar surrounding the thick fil- 
ament is stabilized by interactions between domains C5-C8 
and C7-C10. 



native state the lowest energy and minimizes frustration 
yielding a perfect funnel landscape. We consider a light 
variant of the Go-like force field proposed by Clementi 
et al. and used in several other papers [291. [30 . Isij . The 
model is defined by the Potential Energy [32 |: 
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Where the last potential corresponding to non bonded 
interaction is such that 
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In the above formulas, 



6 (^7^ j "i- ~ j native 

i — j not native 
. 

is the distance between residue 
i and j, 9i is the bending angle identified by the three con- 
secutive Ca's i — 1, i, i + l, 4>i is the dihedral angle defined 
by the two planes formed by four consecutive C^'s i — 2, 
I — 1, 1, The symbols with the superscript and i?y 

are the corresponding quantities in the native conforma- 
tion. The force field parameters are proportional to the 
energy scale cq = 0.3 Kcal/mol such that = lOOOeo/r-Q 

(ro = 3.8 A), kg = 20eo, k^^^ = and kf^ = 0.5eo. The 
parameters of the repulsive Lennard- Jones terms between 
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non native contacts are chosen as follows: a = 5.0 A, 
er — 2/3eo. Two residues i and j are considered to inter- 
act attractively whenever their side-chains have at least a 
pair of heavy atoms closer than a distance cutoff Rc = 5 
A. Accordingly, the attractive native interactions depend 
on the coefficients = eo(l + rnj /nmax), where rnj is 
the number of atomic contacts between residues i and j 
in their native position and rimax = 43 is the maximum 
value of riij in the set of native contacts corresponding to 
the couple Lys45-Tyrl09. We performed controlled tem- 
perature MD simulations employing the isokinetic ther- 
mostat [3^ with time step h = 5x IO^^t, where the time 
unit r = ay/M/7o = 4.67 ps {M is the average mass of 
an aminoacid residue estimated to 110 Da). 



As a remark we observe that several strategies can 
be employed to introduce heterogeneity. A very com- 
mon choice is to use the set of parameters derived by 
Miyazawa and Jernigan [s^, [1^ . Other authors [s^l in- 
stead prefer to tune the energy parameters through a de- 
sign procedure based on energy gap maximization. Our 
strategy of rescaling the contact energies according to the 
number of atomic contacts is grounded on experimen- 
tal and theoretical evidence. In particular, the mutation 
analysis on barnase by Serrano et al. [36*1 showed a non 
trivial correlation between the destabilization induced by 
the mutation and the number of methyl or methylene 
side groups surrounding the deleted group. Moreover, 
Kurochkina and Lee [37j . found that the pairwise sum 
of the buried surface area is linearly related to the true 
buried area, as computed with the algorithm of Lee and 
Richards [33| and to the contact potential of Miyazawa 
and Jernigan [32| . The approach proposed by Kurochk- 
ina was then followed by Sung (3£] for an efficient mod- 
eling of the hydrophobic effect in a Monte Carlo study of 
/3-hairpin folding. A significant correlation between the 
average contribution of individual residues to folding sta- 
bility and the buried ASA was also noticed by Zhou and 
Zhou [40]. To check that heterogeneity we have intro- 
duces does not lead to excessive frustration in the energy 
landscape, we performed rapid quenching simulations to 
collect a data set 10'^ of decoys. We then estimate the 
ratio Tg/Tf between the glassy and folding temperature 
of our protein as the ratio: energetic standard devia- 
tion of the decoy set over the energy gap. We found 
that this quantity which is a measure of the energy land- 
scape frustration remained substantially unaltered from 
the heterogeneous Go-model, Tg/Tf = 0.34, to the ho- 
mogeneous one Tg/Tf = 0.35. 



$-values from our MD simulations 

log(exp{-A£;/i?r})Ts - \og{c^v{- ^E/RT})u 



log(exp{-Ai;/i?r})_F - log(exp{-A£;/i?T}}[ 



(3) 

where the Boltzmann factors depend on the energy dif- 
ference between the mutant and the wild type (WT) and 
the averages are computed over WT-conformations of the 
folded (F), transition state (TS) and unfolded (U) ensem- 
bles. In the present paper, the ^-values are computed 
according to equation [3] using a method developed by 
Clementi et al. [3^ that can be summarized in the fol- 
lowing steps, i) Determination of the folding temperature 
Tf from the specific heat plot, ii) Analysis of the free en- 
ergy profile at temperature Tf plotted as a function of 
a suitable folding reaction coordinate. The double-well 
free energy profile of a two-state folder allows to define 
three windows of the reaction coordinate identifying the 
folded, transition state and unfolded ensembles respec- 
tively, iii) Dynamic simulation at T = Tf and storage 
of conformations belonging to the F, TS and U ensem- 
bles, vi) Choice of mutations and computation of FEP 
values Q. 

Structural information about the native-likeness of the 
transition state was also gained from the so-called struc- 
tural values: 



(4) 



EieC(i)^J'(»'i) 

where Pp{i,j) and Prsih.]) are the frequencies of the 
native contact i—j in the folded and transition ensembles 
respectively, and the sum runs over the set C{i) of native 
contacts in which residue i is involved. 

An interesting property of the Transition State is the 
existence of a few key residues acting as nucleation cen- 
ters for the folding process. Following an approach pro- 
posed by Vendruscolo et al. [4l| , the importance of these 
residues can be better understood by portraying the pro- 
tein as a weighted graph. Residues represent the ver- 
tices and the weighted edges are defined as Wij = 1/Aij 
where Aij represents the fraction of TS ensemble struc- 
tures where residues i and j are in contact. By using the 
Dijkstra's algorithm [4^ we computed the minimal path 
Xij i. e. the minimum of the sum of the weights Wki of the 
edges traversed along each route between i and j. The 
fraction of minimal paths passing through residue k de- 
fines the hetweenness of that residue. This quantity 
therefore measures the centrality of a residue: residues 
with a high betweenness act as "hubs" in the network 
and they presumably play a crucial role in the stabiliza- 
tion of the transition state. 



A customary indicator of the native-likeness of residues 
in the transition state (TS) is represented by the $- 
values: a value $ ^ 1, characterizes residues establish- 
ing native- like interactions already in the TS, whereas a 
value close to zero is typical of residues involved into a 
disordered conformation in the TS. We apply the free en- 
ergy perturbation technique (FEP) to evaluate the 



III. RESULTS 

A thermal folding simulation of the wild-type (WT) 
C5-domain was performed by gradually cooling the pro- 
tein from temperature T = 2.5 to T = 1.5 in 50 tempera- 
ture steps. For each temperature, an equilibration stage 
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Species 






Ii2 


WT 


322.06 ± 0.75 


322 ± 3 


0.976 


Mutl4 


318.46 ± 1.44 


322 ± 3 


1.0 


Mut28 


N.A. 


317 ± 3 


0.989 


MutllS 


309.36 ± 0.41 


313 ± 3 


1.0 


A 1-7 


N. A. 


311 ± 4 


1.0 



10000 



TABLE I; Experimental and simulated folding temperatures 
of WT domain C5 of MyBP-C and its mutants. The last 
column reports the cooperativity parameters K2- N.A. = Not 
Available Data. 



of 5 X 10^ time steps was followed by a production stage 
of 5 X 10^ time steps. A similar schedule was employed 
to simulate the folding of the three missense mutants 
AsnllSLys, Argl4His and Arg28His (notice that protein 
residues have been renumbered 1-130 as a restriction to 
the C5 domain only). Within the framework of the Go- 
model, we decided to implement a mutation of a residue 
by turning all its native contacts into non-native ones. 
The role of the amino-terminal region of the protein was 
also investigated through folding simulations of a dele- 
tion mutant, where the first 7 residues of the C5 domain 
were removed. 

The specific heat profiles, displayed in Figure [21 show 
that the WT C5-domain and the missense mutants 
fold according to a cooperative, two-state mechanism as 
quantified by the van't Hoff criterion [i^ determined by 
parameter k,2 — 2Tf -y^fcsCt, (Tf)/ AHcai expressing the 
ratio between the van't Hoff and the calorimetric en- 
thalpies after appropriate baseline subtraction [i^ in en- 
ergy or Cv plots. A value of K2 close to unity indicates a 
very cooperative behavior of the folding transition. Ta- 
ble mil summarizes the K2 values along with the experi- 
mental and theoretical transition temperatures. 

Figure [5] also shows that a mutation on Argl4 hardly 
has any effect on the thermodynamic stability of the pro- 
tein as the thermogram of this mutant is almost per- 
fectly superposed to that of the wild-type. A mutation 
on Arg28, conversely, determines a shift of the folding 
temperature to lower values and the shift in Tf is even 
larger for a mutation on AsnllS. It is worthwhile notic- 
ing that the different stability of Mutl4 and Mut28 is 
appreciated only using the heterogeneous model while 
the peaks of the two Cy plots remain unsolved for the 
homogeneous Go-model. Apart for this difference in the 
resolution power, however the relative positions of the Tf 
remains unchanged in the two models. The destabiliz- 
ing effect of the mutation on Asnll5 appearing from our 
simulations is in agreement with CD and NMR spectra 
recorded by Idowu et al. .16J showing that the AsnllSLys 
mutant is unstable and largely unfolded as compared to 
the wild-type C5 motif. The same authors [l^ also no- 
ticed that the Argl4His mutant is as well folded and as 
stable as the wild-type which, again, is consistent with 
the good superposition of the thermograms of the wild- 
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Mat 28 
Mat 115 
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FIG. 2: Thermal behavior of heat capacity of the WT C5 
domain, the missense mutants deprived of the native con- 
tacts of Argl4, Arg28 and AsnllS, and the deletion mu- 
tant lacking the 1-7 subsequence. Computations have been 
performed processing with the weighted histogram method 
the data collected during folding simulations. The temper- 
ature is measured in units eo/R = 151.1 K and Cv in units 
R = 1.9855 X 10"^ Kcal moP^R-^ 



type and Argl4 mutants found in our simulations. How- 
ever, it is important to notice that, while the Asnll5Lys 
mutant appears to be largely unfolded in mutagenesis 
experiments, a folding simulation using the Go-model al- 
ways ends up in the correct native structure and the only 
trace of a mutation is a shift in Tf . This is due to the fact 
that the Go-model introduces a bias towards the native 
state so strong to override the disruptive effect of most 
mutations. 

The 1-7 deletion mutant finally appears to be the most 
destabilized one as it produces the largest shift in Tf. 
This result is also consistent with the observation by 
Idowu and coworkers that the Al — 7 mutant remains 
largely unfolded. The effect of the three FHC-related 
mutations was further investigated through $-value anal- 
ysis. The free energy profile of the WT protein as a 
function of the overlap Q (fraction of native contacts) 
at the folding temperature Tf = 2.1, shows the typical 
double-well pattern of two-state folders as illustrated in 
Figure O 

The well centered values on low overlap correspond to 
the unfolded state ensemble (U) , whereas the well insist- 
ing in the high overlap region is related to the folded 
state ensemble (F). The barrier between the two wells 
represents the transition state ensemble (TS). Conforma- 
tions belonging to the F, TS and U ensembles can thus 
be sampled by the choice of three appropriate windows 
of the reaction coordinate Q (Fig. [5]), and used for the 
computation of <I>- values using the perturbation approach 
(Methods). 

The domain C5 results to be asymmetric with respect 
to the distribution of FEP values, in particular the 



5 



25 



20 





{1 1 1 1 1 ii 1 1 
1 1 
1 1 


1 1 1 1 1 1 1 ll 
— J - 2 088 


1 3 

1 3 

I'M 




1 1 
1 1 


-- T = 2.098 


|j| - 




1 1 


1 ■- T = 2.108 1 


t| - 




1 I TS 


■ ■ T = 2.117 




111 


1 aH 1 1 

! 


1 1 


F .1 
.11 

■4 


-'A 


' /;• 1 


J 1 


' 


- 'A 


1 /'.'' 1 

1 /x 1 




■II h 




1 //> 1 












- "\\V 


V/r j 


1 v* 1 


.•/// 1 




J-'.'.- 


\\* ■ . 1 




•\ ^ 












!,,,,,,!, 





0.2 



0.4 



0.6 



FIG. 3: Profiles of potential of mean force versus overlap 
around the folding temperature Tf = 2.1. The Figure shows 
three windows of overlap corresponding to the Unfolded (0 < 
Q < 0.15), Transition State (0.35 < Q < 0.5) and Folded 
State Ensemble (0.75 < Q < 0.9). 





10 20 30 40 50 60 70 80 90 100 110 120 130 
residue index 



FIG. 4: Structural and perturbation $-values (top panel) 
compared to the betweenness (bottom panel): the similar- 
ity between the profiles of <1>- values and betweenness suggests 
that the key residues stabilizing the TS, derive their impor- 
tance from their centrality in the network of contacts of the 
protein. The three profiles were computed using the confor- 
mations sampled in a run at folding temperature Tf =2.1 in 
the three windows of overlap shown in Fig. [3] "Head" refers 
to the N-terminal 1-17 region. 



sheet containing the longest strands (/?i) is characterized 
by high $-values while the sheet formed by the shorter 
strands (/32) has low <i> values (Figure O. This result is 
in agreement with what suggested by Clarke and cowork- 
ers [21]. In fact, the long-stranded sheet /3i, derives its 
high stability from the presence of many contacts chem- 
ically corresponding to hydrogen bonds. The contacts 
more contributing to the stability, however, are the same 
that stabilize the TS. Thus the sheet more important for 
the stability (/3i ) is also the sheet whose formation repre- 
sents the rate limiting step in the folding kinetics. A more 



FIG. 5: Color-coded distribution of perturbation $-values 
on the structure of the C5 domain. Yellow; $ < 0.25; 
Red: 0.25 < <E> < 0.35; Green: 0.35 < $ < 0.45; Cyan: 
0.45 < $ < 0.55; Blue: 0.55 < $ < 0.75. The blue and 
cyan regions corresponding to the highest ^-values are con- 
centrated on strands C, F and G. The BDE sheet conversely 
is characterized by low $-values. 



detailed picture of the transition state is provided by Fig- 
ure ini displaying a contact map where contact $-values 
are quantified by a color scale. The most important con- 
tacts stabilizing the TS are those between strands C and 
F and strands F and G. A minor contribution to the sta- 
bility of the TS is also provided by the contacts linking 
the central parts of strands B and A' and Head-Head con- 
tacts (where "Head" corresponds to the N-terminal 1-17 
segment). Structural $-values offer supplementary data 
complementing the scenario supplied by the FEP analy- 
sis. The two indicators in fact store different information: 
high FEP ^-values identify those residues whose muta- 
tion most destabilizes the TS. Whereas high structural <i>- 
values characterize residues in a native-like conformation 
in the TS regardless of whether they actively stabilize 
the structure or they were passively driven in a correct 
native-like conformation by the rearrangement of neigh- 
boring regions of the molecule. This pattern was also 
observed in an experimental study on a fibronectin-like 
domain reported in Ref. ^45*1 . The peculiar features of the 
two indicators thus explain why structural $-values are 
systematically higher than FEP <i>- values. The difference 
between structural and perturbation <i>-values yields a 
profile whose peaks identify the residues passively driven 
in a native- like conformation in the TS: this residues are 
mainly located on strand D and at the boundary be- 
tween strands F and G. It can also be noticed that Prol2 
plays an active role in stabilizing the N-terminal region 
in a native- like conformation in the TS. while His8 and 
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FIG. 6; Bond <l>-values. The color-coded contact map shows 
that the C-F and F-G contacts feature the highest <I>-values 
and they thus provide the most relevant contribution to the 
stability of the TS. The symbol "H" designates the amino- 
terminal 1-17 segment. 



Glyl3 are just passively placed in the correct position. 

The analysis of betweenness (Fig. 2]) shows that this 
parameter, representing the fraction of minimal paths 
passing through a given residue, correlates well with the 

values, in agreement with Ref. [4l[. In particular, the 
betweenness confirms the importance of strands C, F and 
G but it differs from $-values in two important regions 
of the protein. Strand E is characterized by a high be- 
tweenness as it is the central strand of the BDE sheet and 
it probably acts as a bridge between strands B and D: 
the importance of this strand may thus be higher than it 
appears from <I>-values alone. Conversely, the N-terminal 
region of the protein is characterized by a low between- 
ness so that it appears to be weakly connected with the 
other parts of the molecule. 

It is instructive to discuss the positions of the 3 FHC- 
related mutations within the two /^-sheets of the C5 do- 
main. In fact, Asnll5 which lies on the sheet character- 
ized by the highest FEP <I>- values, when mutated to Lys, 
is known to completely disrupt the native structure of 
this protein. On the other hand, Argl4 and Arg28 are 
located on the sheet with low FEP $-values and their 
mutation does not significantly affect the thermodynamic 
stability. 

As a final remark, we tested the sensitivity of between- 
ness and structural and perturbative $-values on the 
choice of the reaction coordinate. The computation pro- 
tocol for these parameters has been repeated by using the 
Kabsch RMSD [i^ as a collective coordinate for sampling 
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FIG. 7: Linear regression of parameters computed using the 
Kabsch RMSD and the fraction of native contacts Q. (a) Free 
energy Perturbative "li-values computed using Kabsch RMSD 
reaction coordinate versus the same quantity computed using 
the overlap Q. Correlation coefficient r — 0.8. (b) Structural 
"l>-values (Kabsch RMSD) versus ^stmc (Q). Correlation co- 
efhcient r = 0.99. (c) Betweenness (Kabsch RMSD) versus 
Bk (Q). Correlation coefficient r — 0.90. 



of F, TS and U ensemble structures. The existence of lin- 
ear relations (Fig. [7|) , with correlations coefficient greater 
than 0.8, between the parameters computed using either 
the Kabsch RMSD or the fraction of native contacts Q 
indicates that the sampling of the F, TS and U conforma- 
tions is equivalent for the two methods. Thus the infor- 
mation conveyed by the parameters ^fep, ^struct, is 
statistically significative because not strongly dependent 
on the reaction coordinates used to compute them. 



A. The CD loop 

As already mentioned in Section |T1 one of the most in- 
teresting features of the cardiac isoform of the C5 domain 
of MyBP-C is the presence of a 28 residue long insert that 
makes the CD loop significantly longer (residue 47 to 85) 
than the corresponding loop of the skeletal isoforms. Our 
simulations show that the CD loop is extremely mobile 
as it is involved in very few native contacts. 

This high mobility of the CD loop is an extremely im- 
portant feature of the C5 domain because it is responsible 
for a folding temperature significantly lower than that of 
most Ig domains flGj. The reasons for this unusual mo- 
bility can be clarified through a simple sequence analy- 
sis. First of all, we analyzed the hydrophobicity along 
the amino acid sequence using the Kyte and Doolittle 
scale [131 • The hydrophobicity of each residue was calcu- 
lated by sliding a 5-residue long window over the protein 
sequence and assigning to the central residue the aver- 
age hydrophobicity computed over the entire window. A 
similar approach was employed for the charge where Glu 
and Asp residues contribute -1, Lys and Arg contribute 
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+1 and the other residues are regarded as being neutral 
at physiological pH. The profiles in Figure [5] show that 
all the loops, and in particular the CD loop, feature an 
excess of negative charge and are less hydrophobic than 
the regions corresponding to the /3-strands. An analysis 



Molecular Region 


(7?) 


(^^> 


(H) Bound 


Full Protein 


0.546 


0.417 


0.609 


CD Loop 


0.507 


0.343 


0.595 


Outside CD Loop 


0.563 


0.450 


0.615 




10 20 30 40 50 60 70 80 90 100 110 120 130 




10 20 30 40 50 60 70 80 90 100 110 120 130 
residue 



FIG. 8: Charge (top panel) and hydrophobicity (bottom 
panel) profiles along the protein sequence. Each value is 
an average over a 5-residue long window shifted along the 
polypeptide chain. Hydrophobicity was computed using the 
Kyte-Doolittle scale; regions with a positive value are hy- 
drophobic. The CD loop appears to have a high density of 
negative charge and is scarcely hydrophobic. "Head" desig- 
nates the 1-17 segment of the protein. 

of the amino acid distribution along the chain, reveals 
a high concentration of charged residues in the CD loop 
where the number of Glu (4) and Asp (5) residues ex- 
ceeds the number of Lys (3) and Arg (2) residues. A 
high concentration of Glu is also found in the EF loop. 
A remarkable feature is also the high concentration of 
Pro in the N-terminal 1-17 region, in the BC and in 
the CD loops. In summary, the CD loop is character- 
ized by a high concentration of the residues identified 
by Garner et al. [H, |4^, [H^l as str ong determinants of 
local disorder. Moreover, Uversky [5ll. [s^ showed that 
the combination of low mean hydrophobicity and rela- 
tively high net charge represents a prerequisite for the 
absence of compact structure in proteins under physio- 
logical conditions. In particular, it was shown that the 
charge-hydrophobicity phase-space can be divided into 
two regions by the empirical separatrix line of equation: 



boundary 



{R) + 1.151 
2.785 



(5) 



where H refers to hydrophobicity and R to the charge. 
The proteins located below this line in the phase- 
space are likely to be unfolded in physiologic conditions 
whereas those lying above the separatrix do fold in a com- 
pact, globular conformation. In order to test this issue, 
the mean hydrophobicity and the mean net charge were 
computed for the CD loop, for the portion of the protein 



TABLE II: Positions in the charge-hydrophobicity phase- 
space of three moleculer regions of Domain C5 of MyBP-C. 
The CD loop, the full C5 domain and the portion of the pro- 
tein excluding the CD loop all lie in the natively unfolded 
region of the phase-space below the separatrix (Eq[S}. 



sequence not including this loop and for the whole pro- 
tein sequence. Following Uversky, in these calculations 
the hydrophobicity of individual residues was normalized 
to a scale of to 1, and the mean hydrophobicity is com- 
puted as the sum of the normalized hydrophobicities di- 
vided by the number of residues in the protein segment 
under examination. A similar approach was used for the 
computation of the mean charge. Table |TT] shows that 
the CD loop is located in the natively unfolded region 
of the phase space. The full sequence of the C5 domain 
is closer to the boundary of the natively folded region 
that, however, it cannot reach due to the effect of the 
CD loop in the computation of the average charge and 
hydrophobicity. Finally the portion of the polypeptide 
sequence not including the CD loop is even closer to the 
boundary but it still relies in the natively unfolded region 
due to the features of the minor loops. The properties 
of the CD sequence were further studied by analyzing 
the average number of native contacts per amino-acid as 
suggested in Ref. According to this approach, na- 

tively unfolded proteins are supposed to form a number 
of native interactions insufhcient to compensate for the 
loss of conformational entropy, hence their necessity to 
couple folding with specific ligand binding. As a conse- 
quence, natively unfolded proteins are expected to fea- 
ture an average number of contacts lower than that of 
globular proteins. 

Natively unfolded and globular proteins can also be 
discriminated through a set of 20 artificial parameters 
designed through Monte Carlo maximization (5^ of the 

scoring function Score = {{Xf} — (Xu)) / ySj + S^ where 

(Xf) and (Xu) are the mean values of the adjustable 
parameters in two training dataset of folded and natively 
unfolded proteins and Sf and Su are the corresponding 
standard deviations. 

We generated the profiles of the average number of 
contacts per residue and of the artificial parameters by 
shifting a 5-residue window along the protein sequence 
and assigning the window average to the central residue, 
so that data in Fig. [9] can be compared to the charge and 
hydrophobicity plots of Fig. [51 determined with the same 
procedure. The profiles of the average number of contacts 
and of the artificial parameters in Figure [51 show that 
both indicators are effective in discriminating /3-strands 
and unstructured loops, the latter being characterized 
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by much lower values of the parameters. These results 
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FIG. 9: Profiles of average number of contacts (top) and arti- 
ficial parameters (bottom) along the sequence of C5 domain. 
The computation was performed using the parameters listed 
in Table I of Ref. [s^ . The loops are characterized by lower 
values of both indicators with respect to the strand subse- 
quences. 

thus suggest that the CD loop of domain C5 of MyBP-C 
should be classified as a natively unfolded sequence i.e. it 
is protein fragment that lacks a stable structure even in 
physiological conditions. A common feature of natively 
unfolded proteins is that their folding is usually associ- 
ated with the binding to a specific ligand. It could thus 
be suggested that the long CD loop that appears struc- 
tureless when the C5 domain is dissected from MyBP-C, 
might actually be well folded in vivo due to a close in- 
teraction with a specific ligand. This hypothesis is fur- 
ther supported by the experimental finding that cardiac 
MyBP-C co-purifies with the Calmodulin class-II (CaM- 
II)-like Kinase [H, [M HI so that the folding of the CD 
loop may be accompanied by docking with this enzyme. 



IV. DISCUSSION AND CONCLUSIONS 

The involvement of MyBP-C in Familial Hypertrophic 
Cardiomyopathy motivated a study of the folding of do- 
main C5 through equilibrium MD simulations to gain 
insight into the role of the three FHC-related mutations: 
Asnll5Lys, Argl4IIis and Arg28His. As a member of the 
Immunoglobulin family, domain C5 lends itself to be rea- 
sonably modeled through a Go- like force field . We as- 
sessed the thermodynamics impact of a mutation through 
the entity of the shift in the folding temperature. Our re- 
sults show that, among the three FHC-related mutations 
we examined, Asnll5Lys determines the largest decrease 
in Tf, in agreement with the NMR spectra recorded by 
Idowu [iBl signalling absence of structure even at low 
temperature. Conversely, the Tf shift induced by the 
Arg28His mutation is modest, while the protein desta- 



bilization induced by Argl4His is negligible as its ther- 
mogram is almost perfectly superposable to that of the 
WT. This finding suggests that the latter two mutants 
have very little effect on protein stability and their patho- 
logical role must be sought elsewhere. Both mutations 
Argl4His and Arg28His imply the removal of three con- 
tacts and their impact in the Go-like approach could be 
partially resolved only through the introduction of het- 
erogeneous energetic couplings suggesting the opportu- 
nity of a more refined analysis. 

Further insights in the role of Argl4 located in the N- 
terminus of the C5 domain, were attained through the 
study of the Al — 7 deletion mutant. The significant 
decrease in Tf of the truncated domain indicates that the 
N-terminal region with its 10-residue long insert typical 
of the cardiac isoform, is not just a linker between the C4 
and C5 domains, but it gives an important contribution 
to the stability. However, the low betweenness of the 
N-terminal residues indicates that they may be involved 
in contacts forming a subgraph only weakly connected 
to the core of the contact network. It is thus possible 
that the N-terminal contacts do appear only when the 
C5 domain is dissected from the rest of the protein and 
that the natural role of this section is more related to the 
binding with domain C8 complementing the negatively 
charged CFGA' surface [T^. 

This potential role of the FHC-related mutations is 
confirmed by the analysis of <I>-values that appear to be 
significantly higher in the CFGA' sheet where Asnll5 is 
located, than in the BDE sheet that including Arg268. 

A final issue considered in the present work, is a anal- 
ysis of the CD loop responsible for the low stability 
of the C5 domain as compared with other Ig domains. 
The charge unbalance and the low hydrophobicity of the 
CD loop, accompanied by a low average number of na- 
tive contacts and a low value of the artificial param- 
eters introduced by Galzitskaya is a clear indica- 
tion that the C5 domain of MyBP-C can be considered 
a natively unfolded protein i.e. a protein that lacks a 
compact, globular structure under physiological condi- 
tions [H, m, 113, HH, Hi- Therefore the role of the CD 
loop of the C5 domain of the cardiac isoform of MyBP- 
C, must be reconsidered within the framework of the pe- 
culiar properties of natively unfolded proteins. As the 
cardiac MyBP-C co-purifies with the Calmodulin class-II 
(CaM-II) like Kinase, the CD loop rnay represent an SH3 
domain recognition region [m . [ssj . In experiments 
and simulations performed on the C5 domain alone, the 
CD loop due to its high mobility, destabilizes the pro- 
tein, lowering its folding temperature. In vivo, however, 
the CD loop may fold upon binding with the CaM-II-like 
Kinase, so that the thermodynamic stability and the fold- 
ing temperature of the protein may be similar to those 
of the other Ig domains. Our results also suggest that 
the cardiac C5 domain might regulate the activity of the 
CaM-II-like Kinase whose docking may trigger the fold- 
ing of the CD loop. In such a case, the C5 domain may be 
not only a structural component of the Moolman-Smook 



9 



collar (Fig. [T]) but it also may play an important role in the regulation of muscular contraction. 
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